我们提出了DeepFusion,这是一种模块化的多模式结构,可在不同组合中以3D对象检测为融合激光雷达,相机和雷达。专门的功能提取器可以利用每种模式,并且可以轻松交换,从而使该方法变得简单而灵活。提取的特征被转化为鸟眼视图,作为融合的共同表示。在特征空间中融合方式之前,先进行空间和语义对齐。最后,检测头利用丰富的多模式特征,以改善3D检测性能。 LIDAR相机,激光摄像头雷达和摄像头融合的实验结果显示了我们融合方法的灵活性和有效性。在此过程中,我们研究了高达225米的遥远汽车检测的很大程度上未开发的任务,显示了激光摄像机融合的好处。此外,我们研究了3D对象检测的LIDAR点所需的密度,并在对不利天气条件的鲁棒性示例中说明了含义。此外,对我们的摄像头融合的消融研究突出了准确深度估计的重要性。
translated by 谷歌翻译
本文提出了一种使用对象检测网络在汽车雷达数据上学习对象的笛卡尔速度的方法。提出的方法是在为速度生成自己的训练信号方面进行的。标签仅用于单帧,定向边界框(OBB)。不需要昂贵的笛卡尔速度或连续序列的标签。一般的想法是在不使用单帧OBB标签的情况下预先培训对象检测网络,然后利用网络的OBB预测未标记的数据进行速度训练。详细说明,使用预测的速度以及未标记框架的更新OBB之间的距离和标记框架的OBB预测之间的距离,将网络对未标记帧的OBB预测更新为标记帧的时间戳,用于生成一个自我的预测。监督速度的训练信号。检测网络体系结构由一个模块扩展,以说明多次扫描的时间关系和一个模块,以明确表示雷达的径向速度测量值。仅首次训练的两步方法使用OBB检测,然后使用训练OBB检测和速度。此外,由雷达径向速度测量产生的伪标记的预训练引导Bootstraps本文的自我监督方法。公开可用的Nuscenes数据集进行的实验表明,所提出的方法几乎达到了完全监督培训的速度估计性能,但不需要昂贵的速度标签。此外,我们优于基线方法,该方法仅使用径向速度测量作为标签。
translated by 谷歌翻译
本文介绍了新型混合体系结构,它们结合了基于网格的处理,以改善基于雷达对象检测网络的检测性能和方向估计。纯粹基于网格的检测模型在输入点云的鸟眼视图(BEV)投影上运行。这些方法通过离散的网格分辨率损失了详细信息的损失。这特别适用于雷达对象检测,其中相对粗糙的网格分辨率通常用于解释雷达点云的稀疏性。相反,基于点的模型不会受到此问题的影响,因为它们在没有离散化的情况下处理点云。但是,它们通常表现出比基于网格的方法更差的检测性能。我们表明,基于点的模型可以在网格渲染之前提取邻域功能,利用点的确切相对位置。这对于随后的基于网格的卷积检测主链具有重大好处。在公共Nuscenes数据集的实验中,我们的混合体系结构在检测性能方面取得了改进(汽车类的地图比次要的雷达范围提交比仅限雷达提交的地图高19.7%)和方向估计值(11.5%的相对方向改善)比以前文献的网络相比。
translated by 谷歌翻译
制定了具有机器学习模拟(骆驼)项目的宇宙学和天体物理学,通过数千名宇宙的流体动力模拟和机器学习将宇宙学与天体物理学结合起来。骆驼包含4,233个宇宙学仿真,2,049个n-body和2,184个最先进的流体动力模拟,在参数空间中采样巨大的体积。在本文中,我们介绍了骆驼公共数据发布,描述了骆驼模拟的特性和由它们产生的各种数据产品,包括光环,次麦,银河系和空隙目录,功率谱,Bispectra,Lyman - $ \ Alpha $光谱,概率分布函数,光环径向轮廓和X射线光子列表。我们还释放了超过骆驼 - 山姆的数十亿个星系的目录:与Santa Cruz半分析模型相结合的大量N身体模拟。我们释放包含350多个Terabytes的所有数据,并包含143,922个快照,数百万光环,星系和摘要统计数据。我们提供有关如何访问,下载,读取和处理数据AT \ URL {https://camels.readthedocs.io}的进一步技术详细信息。
translated by 谷歌翻译
大多数自我监督的单眼深度估计方法侧重于驾驶场景。我们表明,这些方法概括了看不见的复杂室内场景,其中物体杂乱,在近场中被任意排列。为了获得更多的稳健性,我们提出了一种结构蒸馏方法来从预磨削的深度估计器中学习诀窍,该估计由于其在野外的混合数据集训练而产生的结构化但度量无话束深度。通过将蒸馏与自我监督的分支组合,从左右一致性学习指标,我们为通用室内场景获得结构化和公制深度,并实时推断。为了促进学习和评估,我们收集Simsin,一个数据集,与数千个环境和Unisin,一个数据集,该数据集包含了关于通用室内环境的大约500个真实扫描序列。我们在SIM-to-Real和实际设置中进行实验,并显示定性和定量的改进,以及使用我们的深度映射的下游应用程序。这项工作提供了完整的研究,涵盖方法,数据和应用程序。我们认为这项工作通过自我监督为实际室内深度估计奠定了坚实的基础。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译